racial category
Racial/Ethnic Categories in AI and Algorithmic Fairness: Why They Matter and What They Represent
Racial diversity has become increasingly discussed within the AI The utilization of racial and ethnic categories in the development and algorithmic fairness literature, yet little attention is focused on of datasets and models facilitates the inclusion and documentation justifying the choices of racial categories and understanding how of diverse perspectives. Racial and ethnic categories are especially people are racialized into these chosen racial categories. Even less crucial for datasets and models in which race and ethnicity attention is given to how racial categories shift and how the racialization serve as relevant factors, may act as confounding variables, or enable process changes depending on the context of a dataset or the ability to audit for fairness using race and ethnicity for model. An unclear understanding of who comprises the racial categories fairness purposes. For example, understanding the racial and/or chosen and how people are racialized into these categories ethnic target of hate speech is crucial for understanding the impact can lead to varying interpretations of these categories. These varying of hate speech, as hate speech can differ based on the race interpretations can lead to harm when the understanding of and/or ethnicity of the target[48]. Similarly, in health, race is correlated racial categories and the racialization process is misaligned from with health outcomes[6], and knowledge of a patient's race the actual racialization process and racial categories used. Harm and ethnicity can help contextualize the patient's experience and can also arise if the racialization process and racial categories used health history[53]. In algorithmic fairness settings, knowledge of are irrelevant ordonot exist inthecontext they areapplied.
Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the Intensive Care Unit (ICU) length of stay (LOS). Highlighting the critical role of the ICU in managing critically ill patients, the study addresses the growing strain on ICU capacity. It emphasizes the significance of LOS prediction for resource allocation. The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction. While the XGBoost model performs well overall, disparities across race and insurance attributes reflect the need for tailored assessments and continuous monitoring. The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements
Imai, Kosuke, Olivella, Santiago, Rosenman, Evan T. R.
Prediction of individual's race and ethnicity plays an important role in social science and public health research. Examples include studies of racial disparity in health and voting. Recently, Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to combine information from Census surname files with the geocoding of an individual's residence, has emerged as a leading methodology for this prediction task. Unfortunately, BISG suffers from two Census data problems that contribute to unsatisfactory predictive performance for minorities. First, the decennial Census often contains zero counts for minority racial groups in the Census blocks where some members of those groups reside. Second, because the Census surname files only include frequent names, many surnames -- especially those of minorities -- are missing from the list. To address the zero counts problem, we introduce a fully Bayesian Improved Surname Geocoding (fBISG) methodology that accounts for potential measurement error in Census counts by extending the naive Bayesian inference of the BISG methodology to full posterior inference. To address the missing surname problem, we supplement the Census surname data with additional data on last, first, and middle names taken from the voter files of six Southern states where self-reported race is available. Our empirical validation shows that the fBISG methodology and name supplements significantly improve the accuracy of race imputation across all racial groups, and especially for Asians. The proposed methodology, together with additional name data, is available via the open-source software WRU.
What Germany's Lack of Race Data Means During a Pandemic
"What do you think the rate of Covid-19 is for us?" This is the question that many Black people living in Berlin asked me at the beginning of March 2020. The answer: We don't know. Unlike other countries, notably the United States and the United Kingdom, the German government does not record racial identity information in official documents and statistics. Due to the country's history with the Holocaust, calling Rasse (race) by its name has long been contested.
Bias and Fairness in Machine Learning, Part 3: building a bias-aware model
Let's begin to construct a more bias-aware model using two feature engineering techniques. We will begin by applying a familiar transformation to construct a new less-biased column and then move on to our feature extraction method of the book. Our goal is to minimize the bias of our model without sacrificing a great deal of model performance. We're going to do something similar to the box-cox transformation to transform some of our features in order to make them appear more normal. To set up why we have to investigate the reasons for which our model is under-predicting recidivism for non-African-American people.
Racial categories in machine learning
Benthall, Sebastian, Haynes, Bruce D.
Controversies around race and machine learning have sparked debate among computer scientists over how to design machine learning systems that guarantee fairness. These debates rarely engage with how racial identity is embedded in our social experience, making for sociological and psychological complexity. This complexity challenges the paradigm of considering fairness to be a formal property of supervised learning with respect to protected personal attributes. Racial identity is not simply a personal subjective quality. For people labeled "Black" it is an ascribed political category that has consequences for social differentiation embedded in systemic patterns of social inequality achieved through both social and spatial segregation. In the United States, racial classification can best be understood as a system of inherently unequal status categories that places whites as the most privileged category while signifying the Negro/black category as stigmatized. Social stigma is reinforced through the unequal distribution of societal rewards and goods along racial lines that is reinforced by state, corporate, and civic institutions and practices. This creates a dilemma for society and designers: be blind to racial group disparities and thereby reify racialized social inequality by no longer measuring systemic inequality, or be conscious of racial categories in a way that itself reifies race. We propose a third option. By preceding group fairness interventions with unsupervised learning to dynamically detect patterns of segregation, machine learning systems can mitigate the root cause of social disparities, social segregation and stratification, without further anchoring status categories of disadvantage.